226 research outputs found

    TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

    Full text link
    We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text. The tool has been written using Python library Theano, which allows researcher to easily extend it and tune any aspect of the training process. Regardless of the flexibility, Theano is able to generate extremely fast native code that can utilize a GPU or multiple CPU cores in order to parallelize the heavy numerical computations. The tool has been evaluated in difficult Finnish and English conversational speech recognition tasks, and significant improvement was obtained over our best back-off n-gram models. The results that we obtained in the Finnish task were compared to those from existing RNNLM and RWTHLM toolkits, and found to be as good or better, while training times were an order of magnitude shorter

    Using stacked transformations for recognizing foreign accented speech

    Get PDF
    A common problem in speech recognition for foreign accented speech is that there is not enough training data for an accent-specific or a speaker-specific recognizer. Speaker adaptation can be used to improve the accuracy of a speaker independent recognizer, but a lot of adaptation data is needed for speakers with a strong foreign accent. In this paper we propose a rather simple and successful technique of stacked transformations where the baseline models trained for native speakers are first adapted by using accent-specific data and then by another transformation using speaker-specific data. Because the accent-specific data can be collected offline, the first transformation can be more detailed and comprehensive, and the second one less detailed and fast. Experimental results are provided for speaker adaptation in English spoken by Finnish speakers. The evaluation results confirm that the stacked transformations are very helpful for fast speaker adaptation.Peer reviewe

    Indexing Audio Documents by using Latent Semantic Analysis and SOM

    Get PDF
    This paper describes an important application for state-of-art automatic speech recognition, natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection and use it for more accurate indexing by generating new index terms and stochastic index weights. Indexing methods are evaluated for two broadcast news databases (one French and one English) using the average document perplexity defined in this paper and test queries analyzed by human expert

    Indexing spoken audio by LSA and SOMs

    Get PDF
    This paper presents an indexing system for spoken audio documents. The framework is indexing and retrieval of broadcast news. The proposed indexing system applies latent semantic analysis (LSA) and self-organizing maps (SOM) to map the documents into a semantic vector space and to display the semantic structures of the document collection. The SOM is also used to enhance the indexing of the documents that are difficult to decode. Relevant index terms and suitable index weights are computed by smoothing the document vectors with other documents which are close to it in the semantic space. Experimental results are provided using the test data of the TREC's spoken document retrieval (SDR) track

    PUHEENTUNNISTUS

    Get PDF
    Automaattinen puheentunnistus on merkittävä tilastollisten ja oppivienhahmontunnistusmenetelmien sovellus, joka on erityisasemassa myöslukuisten tavallisillekin ihmisille läheisten sovellustensa vuoksi. Vaikkapuheentunnistus on ihmiselle helppoa, on se kuitenkin erittäin haastavaongelma äänisignaalin ja kielen rikkauden ja monimuotoisuuden vuoksi.Tässä artikkelissa esitellään lyhyesti nykyaikaisen puheentunnistimientoimintaperiaatetta, ratkaisujen matemaattista perustaa sekä tunnistintensuorituskykyä. Lisäksi luodaan lyhyt katsaus puheentunnistuksentutkimukseen Suomessa.Avainsanat: PuheentunnistusKeywords: Speech recognitio

    Fast latent semantic indexing of spoken documents by using self-organizing maps

    Get PDF
    This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC's spoken document retrieval databases

    Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

    Get PDF
    We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.Peer reviewe

    On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation

    Full text link
    Compositional generalisation (CG), in NLP and in machine learning more generally, has been assessed mostly using artificial datasets. It is important to develop benchmarks to assess CG also in real-world natural language tasks in order to understand the abilities and limitations of systems deployed in the wild. To this end, our GenBench Collaborative Benchmarking Task submission utilises the distribution-based compositionality assessment (DBCA) framework to split the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity. Specifically, the training and test sets have divergent distributions of dependency relations, testing NMT systems' capability of translating dependencies that they have not been trained on. This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages. The code and data for the experiments is available at https://github.com/aalto-speech/dbca.Comment: To appear at the GenBench Workshop at EMNLP 202
    • …
    corecore